61 research outputs found
Working Memory Capacity of ChatGPT: An Empirical Study
Working memory is a critical aspect of both human intelligence and artificial
intelligence, serving as a workspace for the temporary storage and manipulation
of information. In this paper, we systematically assess the working memory
capacity of ChatGPT (gpt-3.5-turbo), a large language model developed by
OpenAI, by examining its performance in verbal and spatial n-back tasks under
various conditions. Our experiments reveal that ChatGPT experiences significant
declines in performance as n increases (which necessitates more information to
be stored in working memory), suggesting a limit to the working memory capacity
strikingly similar to that of humans. Furthermore, we investigate the impact of
different instruction strategies on ChatGPT's performance and observe that the
fundamental patterns of a capacity limit persist. From our empirical findings,
we propose that n-back tasks may serve as tools for benchmarking the working
memory capacity of large language models and hold potential for informing
future efforts aimed at enhancing AI working memory and deepening our
understanding of human working memory through AI models.Comment: 19 pages, 21 figures, 10 table
Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning
Prompt-based learning has been an effective paradigm for large pretrained
language models (LLM), enabling few-shot or even zero-shot learning. Black-box
prompt search has received growing interest recently for its distinctive
properties of gradient-free optimization, proven particularly useful and
powerful for model-as-a-service usage. However, the discrete nature and the
complexity of combinatorial optimization hinder the efficiency of modern
black-box approaches. Despite extensive research on search algorithms, the
crucial aspect of search space design and optimization has been largely
overlooked. In this paper, we first conduct a sensitivity analysis by prompting
LLM, revealing that only a small number of tokens exert a disproportionate
amount of influence on LLM predictions. Leveraging this insight, we propose the
Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple
black-box search method that first clusters and prunes the search space to
focus exclusively on influential prompt tokens. By employing even simple search
methods within the pruned search space, ClaPS achieves state-of-the-art
performance across various tasks and LLMs, surpassing the performance of
complex approaches while significantly reducing search costs. Our findings
underscore the critical role of search space design and optimization in
enhancing both the usefulness and the efficiency of black-box prompt-based
learning.Comment: Findings of EMNLP 2023. 10 pages, 5 figures, 4 tables (14 pages, 5
figures, 8 tables including references and appendices
Explaining the Adaptive Generalisation Gap
We conjecture that the inherent difference in generalisation between adaptive
and non-adaptive gradient methods stems from the increased estimation noise in
the flattest directions of the true loss surface. We demonstrate that typical
schedules used for adaptive methods (with low numerical stability or damping
constants) serve to bias relative movement towards flat directions relative to
sharp directions, effectively amplifying the noise-to-signal ratio and harming
generalisation. We further demonstrate that the numerical stability/damping
constant used in these methods can be decomposed into a learning rate reduction
and linear shrinkage of the estimated curvature matrix. We then demonstrate
significant generalisation improvements by increasing the shrinkage
coefficient, closing the generalisation gap entirely in both Logistic
Regression and Deep Neural Network experiments. Finally, we show that other
popular modifications to adaptive methods, such as decoupled weight decay and
partial adaptivity can be shown to calibrate parameter updates to make better
use of sharper, more reliable directions
Iterative Averaging in the Quest for Best Test Error
We analyse and explain the increased generalisation performance of iterate
averaging using a Gaussian process perturbation model between the true and
batch risk surface on the high dimensional quadratic. We derive three phenomena
\latestEdits{from our theoretical results:} (1) The importance of combining
iterate averaging (IA) with large learning rates and regularisation for
improved regularisation. (2) Justification for less frequent averaging. (3)
That we expect adaptive gradient methods to work equally well, or better, with
iterate averaging than their non-adaptive counterparts. Inspired by these
results\latestEdits{, together with} empirical investigations of the importance
of appropriate regularisation for the solution diversity of the iterates, we
propose two adaptive algorithms with iterate averaging. These give
significantly better results compared to stochastic gradient descent (SGD),
require less tuning and do not require early stopping or validation set
monitoring. We showcase the efficacy of our approach on the CIFAR-10/100,
ImageNet and Penn Treebank datasets on a variety of modern and classical
network architectures
BOiLS: Bayesian Optimisation for Logic Synthesis
Optimising the quality-of-results (QoR) of circuits during logic synthesis is a formidable challenge necessitating the exploration of exponentially sized search spaces. While expert-designed operations aid in uncovering effective sequences, the increase in complexity of logic circuits favours automated procedures. To enable efficient and scalable solvers, we propose BOiLS, the first algorithm adapting Bayesian optimisation to navigate the space of synthesis operations. BOiLS requires no human intervention and trades-off exploration versus exploitation through novel Gaussian process kernels and trust-region constrained acquisitions. In a set of experiments on EPFL benchmarks, we demonstrate BOiLS's superior performance compared to state-of-the-art in terms of both sample efficiency and QoR values
- …